Parallel Programming Drivers

11/13/2010 3:54:45 PM

Microsoft has invested heavily in .NET Framework 4 and Visual Studio 2010 to allow all developers to safely and easily embrace parallel programming concepts within applications. The main driver for the Parallel Programming model is simply that the era of massive doubling and redoubling of processor speed is apparently over, and therefore, the era of our code automatically doubling in speed as hardware improves is also coming an end. This breakdown in Moore’s Law (proposed by an Intel executive that predicted a doubling of CPU processor speed every two years), as it is commonly known, was brought to a screeching halt due to thermal and power constraints within the heart of a PC in the silicon processor that executes programs. To counter this roadblock and to allow computing speed to continue to scale, processor manufacturers and computer hardware designers have simply added more than one processor into the hardware that runs code.
However, not all programming techniques, languages, compilers, or developers for that matter automatically scale to writing multiple core compatible code. This leaves a lot of unused processing power on the table—CPU processing power that could improve responsiveness and user experience of applications.

Microsoft’s MSDN Magazine published an article titled “Paradigm Shift—Design Considerations for Parallel Programming” by David Callahan, which offers Microsoft’s insight regarding the drivers and approach Microsoft is taking in response to the need for parallel programming techniques. It begins by setting the scene:

...today, performance is improved by the addition of processors. So-called multicore systems are now ubiquitous. Of course, the multicore approach improves performance only when software can perform multiple activities at the same time. Functions that perform perfectly well using sequential techniques must be written to allow multiple processors to be used if they are to realize the performance gains promised by the multiprocessor machines.

And concludes with the call to action:

The shift to parallelism is an inflection point for the software industry where new techniques must be adopted. Developers must embrace parallelism for those portions of their applications that are time sensitive today or that are expected to run on larger data sets tomorrow.

The main take-away regarding parallel programming drivers is that there is no more free application performance boost just because of a hardware CPU speed upgrade; for applications to run faster in the future, programming techniques that support multiple processors (and cores) need to be the standard approach. The techniques employed must also not be limited in how many CPU cores the code was originally authored from. The application needs to detect and automatically embrace the available cores on the executing hardware (which will likely be orders of magnitude larger in processing power) whether that be 2 cores or 64 cores. Code must be authored in a way that can scale accordingly without specifically compiled versions.

History of Processor Speed and Multicore Processors

Looking back at processor history, there was a 1GHz speed processor in 2000, which doubled in speed in 2001 to 2Ghz, and topped 3GHz in 2002; however, it has been a long hiatus from seeing processor speed increasing at that rate. In 2008 processor clock speeds were only just approaching 4Ghz. In fact, when clock speed stopped increasing, so did manufacturer marketing the speed of processors; speed was replaced by various measures of instructions per second. Power consumption, heat dissipation, and memory latency are just some of the plethora of limiting factors halting pure CPU clock-speed increases. Another technique for improving CPU performance had to be found in order to keep pace with consumer demand.

The limit of pure clock-speed scaling wasn’t a surprise to this industry as a whole, and Intel engineers, who first published an article in the October 1989 issue of IEEE Spectrum (“Microprocessors Circa 2000”) predicted the use of multicore processor architecture to improve the end-user experience when using PCs. Intel delivered on their promise in 2005, as did competing processor companies, and it is almost certain that any computer bought today has multiple cores built into each microprocessor chip, and for the lucky few, multiple microprocessor chips built into the motherboard. Rather than straight improvement in processor clock speed, there are now more processor cores to do the work. Intel in their whitepaper “Intel Multi-Core Processor Architecture Development Backgrounder” clearly defines in an understandable way what “multicore processors” consist of:

Explained most simply, multi-core processor architecture entails silicon design engineers placing two or more Intel Pentium processor-based “execution cores,” or computational engines, within a single processor. This multi-core processor plugs directly into a single processor socket, but the operating system perceives each of its execution cores as a discrete logical processor with all the associated execution resources.

The idea behind this implementation of the chip’s internal architecture is in essence a “divide and conquer” strategy. In other words, by divvying up the computational work performed by the single Pentium microprocessor core in traditional microprocessors and spreading it over multiple execution cores, a multi-core processor can perform more work within a given clock cycle. Thus, it is designed to deliver a better overall user experience. To enable this improvement, the software running on the platform must be written such that it can spread its workload across multiple execution cores. This functionality is called thread-level parallelism or “threading.” Applications and operating systems (such as Microsoft Windows XP) that are written to support it are referred to as “threaded” or “multi-threaded.”

The final sentence of this quote is important: “Applications and operating systems...that are written to support it...” Although the operating system running code almost certainly supports multi-threading, not all applications are coded in a fashion that fully exploits that ability. In fact, the current use of multi-threading in applications is to improve the perceived performance of an application, rather than actual performance in most cases—a subtle distinction to be explored shortly.

Cores Versus Processors Versus CPUs

The operating system running on your PC (or server) exploits all processor cores in all physical processors it has available to it—aggregating these as a total number of available CPUs. An example is that if a 4-processor machine is running 4-core processors, it will show in the operating system as 16 CPUs. (Multiply the number of physical processors by the number of cores in each processor.) Sixteen CPUs is now common in server machines, and the number of CPUs is increasing due to both increased physical socket and core count growth. Expect 32, 64, or even 128+ CPU machines to be available at a commercial level now and a consumer level shortly.